lopdf
A Rust library for PDF document manipulation.
A useful reference for understanding the PDF file format and the eventual usage of this libary is the PDF 1.7 Reference Document. The PDF 2.0 specification is an ISO standard that you have to pay for, but it is backwards compatible with the open sourced 1.7 version.
Example Code
- Create PDF document
use dictionary;
use ;
use ;
// with_version specifes the PDF version this document complies with.
let mut doc = with_version;
// Object IDs are used for cross referencing in PDF documents. `lopdf` helps keep track of them
// for us. They are simple integers.
// Calls to `doc.new_object_id` and `doc.add_object` return an object id
// pages is the root node of the page tree
let pages_id = doc.new_object_id;
// fonts are dictionaries. The type, subtype and basefont tags
// are straight out of the PDF reference manual
//
// The dictionary macro is a helper that allows complex
// key, value relationships to be represented in a simpler
// visual manner, similar to a match statement.
// Dictionary is linkedHashMap of byte vector, and object
let font_id = doc.add_object;
// font dictionaries need to be added into resource dictionaries
// in order to be used.
// Resource dictionaries can contain more than just fonts,
// but normally just contains fonts
// Only one resource dictionary is allowed per page tree root
let resources_id = doc.add_object;
// Content is a wrapper struct around an operations struct that contains a vector of operations
// The operations struct contains a vector of operations that match up with a particular PDF
// operator and operands.
// Reference the PDF reference for more details on these operators and operands.
// Note, the operators and operands are specified in a reverse order than they
// actually appear in the PDF file itself.
let content = Content ;
// Streams are a dictionary followed by a sequence of bytes. What that sequence of bytes
// represents depends on context
// The stream dictionary is set internally to lopdf and normally doesn't
// need to be manually nanipulated. It contains keys such as
// Length, Filter, DecodeParams, etc
//
// content is a stream of encoded content data.
let content_id = doc.add_object;
// Page is a dictionary that represents one page of a PDF file.
// It has a type, parent and contents
let page_id = doc.add_object;
// Again, pages is the root of the page tree. The ID was already created
// at the top of the page, since we needed it to assign to the parent element of the page
// dictionary
//
// This is just the basic requirements for a page tree root object. There are also many
// additional entries that can be added to the dictionary if needed. Some of these can also be
// defined on the page dictionary itself, and not inherited from the page tree root.
let pages = dictionary! ;
// using insert() here, instead of add_object() since the id is already known.
doc.objects.insert;
// Creating document catalog.
// There are many more entries allowed in the catalog dictionary.
let catalog_id = doc.add_object;
// Root key in trailer is set here to ID of document catalog,
// remainder of trailer is set during doc.save().
doc.trailer.set;
doc.compress;
// Store file in current working directory.
// Note: Line is excluded when running tests
if false
- Merge PDF documents
use dictionary;
use BTreeMap;
use ;
use ;
- Modify PDF document
use Document;
// For this example to work a parser feature needs to be enabled
FAQ
-
Why does the library keep everything in memory as high-level objects until finally serializing the entire document?
Normally a PDF document won't be very large, ranging from tens of KB to hundreds of MB. Memory size is not a bottle neck for today's computer. By keeping the whole document in memory, stream length can be pre-calculated, no need to use a reference object for the Length entry, the resulting PDF file is smaller for distribution and faster for PDF consumers to process.
Producing is a one-time effort, while consuming is many more.